Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

Closed
wants to merge 10 commits into from

Conversation

kirtanav98
Copy link
Contributor

The SplitVariants task used to have some lines to switch columns 5 and 6 of the bed file output, which is read in downstream tasks of TrainRDGenotyping.GenotypePESR. This causes the TrainRDGenotyping.GenotypePESR to error out reporting.

Error: WARNING: Incorrect CNV type specified
1: stop("WARNING: Incorrect CNV type specified")
The python script splitvariants.py was modified to switch the columns to the appropriate order to be compatible with downstream analysis requirements.

@epiercehoffman epiercehoffman marked this pull request as draft February 26, 2024 16:43
# array and increments the counter for that array
line = line.strip('\n').split('\t')
line[4], line[5] = line[5], line[4]
SVTYPE_FIELD = 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reassigning SVTYPE_FIELD here, you should either set SVTYPE_FIELD to 5 at the beginning or (my preference) move the code that swaps the fields to right before you append a new line to current_lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed and the value was set to 5.

'ins': {'condition': lambda line: bca and line[SVTYPE_FIELD] == 'INS'}
}

current_lines = {prefix: [] for prefix in condition_prefixes.keys()}
current_counts = {prefix: 0 for prefix in condition_prefixes.keys()}
current_suffixes = {prefix: 'a' for prefix in condition_prefixes.keys()}

# Open the bed file and process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the comments throughout the script to help document the code's functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments were added.

# Checks which condition and prefix the current line matches and appends it to the corresponding
# array and increments the counter for that array
line = line.strip('\n').split('\t')
line[4], line[5] = line[5], line[4]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment explaining what it's doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment was added.

@kirtanav98 kirtanav98 closed this Mar 1, 2024
@kirtanav98 kirtanav98 deleted the kv_splitvariants_fix branch March 1, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants